(C) 2017-2019 by Damir Cavar
Version: 1.1, November 2019
This is a tutorial related to the discussion of a WordSense disambiguation and various machine learning strategies discussed in the textbook Machine Learning: The Art and Science of Algorithms that Make Sense of Data by Peter Flach.
This tutorial was developed as part of my course material for the course Machine Learning for Computational Linguistics in the Computational Linguistics Program of the Department of Linguistics at Indiana University.
Importing wordnet from the NLTK module:
In [1]:
from nltk.corpus import wordnet
Asking for a synset in WordNet:
In [2]:
wordnet.synsets('cat')
Out[2]:
A synset is identified with a 3-part name of the form: word.pos.nn. Except of the last synset, all other synsets of dog above are nouns with the part-of-speech tag n. We can pick a synset with a specific PoS:
In [3]:
wordnet.synsets('dog', pos=wordnet.VERB)
Out[3]:
Besides VERB the other parts of speech are NOUN, ADJ and ADV.
We can select a specific synset from the list using the full 3-part name notation:
In [4]:
wordnet.synset('dog.n.01')
Out[4]:
Fort this particular synset we can fetch the definition:
In [6]:
print(wordnet.synset('dog.n.01').definition())
Synsets might also have examples. We can count the number of examples for this concrete synset this way:
In [7]:
len(wordnet.synset('dog.n.01').examples())
Out[7]:
We can print out the example using:
In [8]:
print(wordnet.synset('dog.n.01').examples()[0])
We can also output the lemmata for a specific synset:
In [9]:
wordnet.synset('dog.n.01').lemmas()
Out[9]:
Using list comprehension we can convert this list to just the lemma list:
In [10]:
[str(lemma.name()) for lemma in wordnet.synset('dog.n.01').lemmas()]
Out[10]:
We can also reference a concrete lemma:
In [11]:
wordnet.lemma('dog.n.01.dog')
Out[11]:
The current version of WordNet in NLTK is multilingual. To see which languages are supported, use this command:
In [12]:
sorted(wordnet.langs())
Out[12]:
We can ask for the Japanese names of synsets:
In [16]:
wordnet.synset('dog.n.01').lemma_names('cmn')
Out[16]:
We can fetch the English lemmata from different languages for a specific synset:
In [17]:
wordnet.lemmas('cane', lang='ita')
Out[17]:
In [ ]:
dog = wordnet.synset('dog.n.01')
In [ ]:
dog.hypernyms()
In [ ]:
dog.hyponyms()
In [ ]:
dog.member_holonyms()
In [ ]:
dog.root_hypernyms()
In [ ]:
wordnet.synset('dog.n.01').lowest_common_hypernyms(wordnet.synset('cat.n.01'))
In [ ]:
good = wordnet.synset('good.a.01')
In [ ]:
good.lemmas()[0].antonyms()
In [ ]: